Maximum likelihood fitting Lab

Paulo Inácio Prado
2023-02-02

Preparation

In this lab we will use a data set with the number of species of ants recorded in vegetation patches of different sizes in California (Suarez et al. 2008). We will use this data to fit different models for the species-area relationship.

Load the required libraries and the data, which is available in the supplementary material of Ohyama et al. (2021):

## all data sets compiled by Ohyama et al
raw <- read.csv("https://github.com/piLaboratory/bie5781/raw/master/data/jbi14149-sup-0003-appendixs3.csv")
## Suarez et al data
suarez <- raw[raw$Study_ID=="23", c("Island_area", "Species.Richness")]

Model fitting

One of the first models proposed to describe the relationship between the area of an island or of a habitat patch (\(A\)) and the number of species that this area harbors (\(S\)) is the power function:

\[ S \, = \, \alpha A^{\beta_1}\]

Taking the logarithm of both sides of this equation we have a linear function:

\[ \ln S \, = \, \beta_0 + \beta_1 \ln A \]

Where \(\beta_0= \ln \alpha\). Thus, we can evaluate if this model approximates the observed relationship with a simple linear regression:

\[ \begin{align} \ln S_i &\sim \text{Nornal}(\mu_i, \sigma) \\ \mu_i & = \beta_{0} + \beta_1 \ln A_i \end{align} \]

In fact, the species-area relationship looks linear in log-scale:

plot(Species.Richness ~ Island_area, data= suarez,
     xlab = "patch area (m2)", ylab = "Number of species", log="xy")

We will use the maximum likelihood method (ML) to fit the linear regression model to data, and then we will compare the results with the least squares (LS) fit.

Maximum likelihood method

Defining the log-likelihood function in R

To perform ML fitting, we will use a computer optimization routine to find the values of parameters \(\beta_0\), \(\beta_1\) and \(\sigma\) associated with the minimum value of the negative log-likelihood function \(L_y\). We start creating a function in R that calculates this function for the data:

LL1 <- function(beta0, beta1, sigma){
    mu <- beta0 + beta1*log(suarez$Island_area)
    -sum( dnorm(x = log(suarez$Species.Richness), mean = mu, sd = sigma, log=TRUE) )
    }

The command above creates an object of the class function with the three parameters of the linear model as arguments. The first line of the function calculates the expected value \(\mu\), using any value of \({\beta_0,\beta_1}\) you supply to the function. The second line calculates the value of the normal density function for each value of \(ln S\), takes the logarithm of these densities, and sum all them up. For instance, the negative log-likelihood of this model for this data using \(\beta_0 = 1, \beta_1 = 1\) and \(\sigma = 1\) is

LL1(beta0 = 1, beta1 = 1, sigma =1)
[1] 2432.565

Finding the minimum of the log-likelihood function

Now we call an optimization routine to minimize the function defined above. We will use the front-end function mle2, from the package bbmle2. The R function mle2 was suited to the specific problem of computational minimization of negative log-likelihood functions. You can find more details in the help page of mle2, and also in the vignette of bbmle package.

Run the optimization and store the fit in an object called ml1:

ml1 <- mle2(LL1, start = list(beta0 = 0, beta1 = 0.1, sigma = 0.5))

This object has the usual methods for classes of statistical models fitted in R. For example, you get a summary of the fit with:

summary(ml1)

The estimated values of the parameters are the maximum likelihood estimates (MLE). Store them in a separate object:

ml1.cf <- coef(ml1)

And then you can use these estimates to plot the fitted regression line :

plot(log(Species.Richness) ~ log(Island_area), data= suarez,
     xlab = "ln patch area (m2)", ylab = "ln Number of species")
abline(coef = ml1.cf, col ="blue")

Inference: likelihood profiles

How precise are the MLEs we got from this fit? We can check this with likelihood profiles. For models fitted with the function mle2, the package sads has the function plotprofmle, to plot these profiles:

ml1.p <- profile(ml1, alpha =0.05) ## calculate the profile
## A graphic window for 3 plots
par(mfrow = c(1,3))
## The plots
plotprofmle(ml1.p, ylim = c(0,3))

The plot shows the negative log-likelihood re-rescaled to have a minimum at zero. The blue line marks the limit of \(L_y(\theta) = 2\). It is a popular rule that parameter values that result in a likelihood value below this threshold are equally plausible. With this we define a plausible interval for the MLEs. The function likelregions returns the limits of the plausible interval:

likelregions(ml1.p)

The parameter of theoretical importance is the power coefficient \(\beta_1\). The plausible interval is similar to the range of recorded values of this coefficient for other mainland systems of patches (see for example Fig. 2 in Ohyama et al 2021).

Least-squares method

The least-square methods (LS) is the maximum likelihood solution for a simple linear Gaussian regression. You can check this by fitting the same model with the function lm

## Fits the same model with LS
ls1 <- lm(log(Species.Richness) ~ log(Island_area), data= suarez)

and then comparing the coefficients of both fits:

(ls1.cf <- coef(ls1)) # LS
ml1.cf[1:2]

In simple cases like that, the limits of the frequentist confidence interval also fall close those of the plausibility interval:

confint(ls1)
likelregions(ml1.p)

There is a systematic difference, though. The MLE of the \(\sigma\) parameter is the mean of the squared residuals:

\[\hat{\sigma}=\sqrt{\frac{\sum(y_i-\hat{y})^2}{n}}\]

You can check this:

## fitted values:
ml1.yhat <- ml1.cf[1] + ml1.cf[2]*log(suarez$Island_area)
## residuals
ml1.res <- log(suarez$Species.Richness) - ml1.yhat
## Mean squared residuals
sqrt( sum (ml1.res^2) / nrow(suarez) )
## compare with the MLE
ml1.cf[3]

The LS method estimates \(\sigma\) with a bias correction for small samples (\(n-2\)):

\[\hat{\sigma}=\sqrt{\frac{\sum(y_i-\hat{y})^2}{n-2}}\]

Which you can also check:

## fitted values:
ls1.yhat <- ls1.cf[1] + ls1.cf[2]*log(suarez$Island_area)
## residuals
ls1.res <- log(suarez$Species.Richness) - ls1.yhat
## Mean squared residuals
sqrt( sum (ls1.res^2) / (nrow(suarez)-2) )
## compare with the estimated by the fit
summary(ls1)$sigma

Model selection

There are dozens of alternative models to describe the species-area relationship (e.g. Tjørve 2009, Dengler, 2009). We will fit two of them and then use the Akaike Information Criterion (AIC) to check which one is better supported by the data.

Additionally to the power function that we have fitted, we will also try the Gleason’s logarithmic function:

\[S = \beta_0 + \beta_1 \ln A\]

And Kobayashi’s (1975) model:

\[S = \beta_0 \log \left(1 + \frac{A}{\beta_1}\right)\]

The response in these models is \(S\), so to make the model comparison we will refit the power model to \(S\) (instead \(\ln S\) as we did above). We will keep the Gaussian distribution to describe the variation around the expected values. Thus, the power and the Kobayashi models turn to be the expected values for nonlinear Gaussian regressions, that we fit with the function nls:

## Power model
nls1 <- nls(Species.Richness ~ beta0*Island_area^beta1,
                 data = suarez, 
            start = list(beta0 = exp(ml1.cf[1]), beta1 = ml1.cf[2]))
## Kobayashi's model
nls2 <- nls(Species.Richness ~ beta0*log(1 + Island_area/beta1),
                 data = suarez, 
                 start = list(beta0 = exp(ml1.cf[1]), beta1 = 1))

This function performs better with good starting values. For these fits, we have some nice guesses to start, from the linear fit.

The logarithmic model is actually a linear function to \(\ln A\). So, we could fit this model with the lm function. But let’s do that again using optimization:

LL2 <- function(beta0, beta1, sigma){
    mu <- beta0 + beta1*log(suarez$Island_area)
    -sum( dnorm(x = suarez$Species.Richness, mean = mu, sd = sigma, log=TRUE) )
}

ml2 <- mle2(LL2, start = list(beta0=0, beta1 = 1, sigma = 3.5))

Let’s check the curves of the expected values by these three models:

## coefficients
nls1.cf <- coef(nls1)
nls2.cf <- coef(nls2)
ml2.cf <- coef(ml2)
plot(Species.Richness ~ Island_area,
     data = suarez, xlab = "Patch area (m2)",
     ylab = "Number of species")
curve(nls1.cf[1] * x^nls1.cf[2], add=TRUE, col=1)
curve(ml2.cf[1] + ml2.cf[2]*log(x), add=TRUE, col=2)
curve(nls2.cf[1] * log(1+ x/nls2.cf[2]), add=TRUE, col =3)
legend("topleft", c("Power", "Logarithmic", "Kobayashi"),
       lwd=1, col = 1:3, bty="n")

AIC

It is not easy to guess which model performed better. Use the function AICctab to get a model selection table:

AICctab(nls1, ml2, nls2, mnames = c("Power", "Logarithmic", "Kobayashi"),
        base=TRUE, weights = TRUE, nobs = nrow(suarez))

Using our canonical rule, models with \(\Delta \text{AIC} \leq 2\) are equally plausible. Therefore, we can say that this data rules out only the logarithmic model.

Cross validation

AIC spots the models that best approximate the unknown process that generates our data. So, models best ranked by AIC should provide best predictions from future samples.

Of course we could check this if we take another sample, and see if the selected model provides good predictions of the response from the new measurements of the predictor variables in this new sample.

Cross-validation is a class of statistical methods to approximate this situation with only the sample at hand. The general idea is to split your data in a training set, and a testing set. We fit the competing models using the training set and then we use the fitted model to predict the response variable in the testing set. The goodness of fit is usually measured by the mean square error (MSE), which is the average of the squared residuals. Models with small MSEs are those with higher predictive power.

In the Leave-one-out cross-validation, the testing set are single observations taken once from the data set. Therefore, the training set is the entire data set minus the observation taken for the testing set. The procedure is repeated by leaving each single observation out.

There are many libraries for all flavors of cross-validation in R. But to illustrate we will run a leave-one-out cross validation with a loop that shows each step in a (hopefully) easy to read code:

## A matrix to store results
results <- matrix(NA, ncol =3, nrow = nrow(suarez) ,
                  dimnames=list(NULL, c("Power", "Logarithmic", "Kobayashi")))

## A loop to run all fits leaving one observation out each time
for(i in 1:nrow(suarez)) {
    ## Leave one observation out
    data <- suarez[-i,]
    ## fit the models for each data with an observation left out
    m1 <- nls(Species.Richness ~ beta0*Island_area^beta1,
              data = data, 
              start = list(beta0 = nls1.cf[1], beta1 = nls1.cf[2]))
    m2 <- lm(Species.Richness ~ log(Island_area), data = data)
    m3 <- nls(Species.Richness ~ beta0*log(1 + Island_area/beta1),
              data = data, 
              start = list(beta0 = nls2.cf[1], beta1 = nls2.cf[2]))
    ## Predicted value for  the observation left out
    m1.pred <- predict(m1, newdata = suarez[i,])
    m2.pred <- predict(m2, newdata = suarez[i,])
    m3.pred <- predict(m3, newdata = suarez[i,])
    ## Residuals
    results[i,] <- suarez$Species.Richness[i] - c(m1.pred, m2.pred, m3.pred)
}

And here we calculate the MSE:

apply(results, 2, function(x) mean(x^2))

As expected, we get a result pretty similar to the model selection with AIC.

References